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Abstract: Nucleotide (nt) and deduced amino acid (AA) sequences of 11 flaviviruses 
were compared for each genome segment based on the data of yellow fever (YF) virus. 
Two nonstructural proteins, NS3 and NS5, were highly conserved, while core protein (C) 
was least conseved among these viruses. In general, nonstructural proteins were more 
highly conserved than structural proteins. A consensus tripeptide sequence (Gly-Asp-Asp) 
of viral RNA polymerases was found almost at the same position of NS5 protein for all 
flaviviruses. In envelope glycoprotein (E) and NS] protein, 12 Cys residues were conserv- 
ed among all flaviviruses, and E protein possessed some highly conserved regions which 
could be essential for its structure and function. Sequence homology among all 11 
flaviviruses classified them into 3—4 subgroups, which roughly corresponded to the 
subgroups of classical serology. A hexapeptide sequence of Asn-Met-Leu-Lys-Arg-Gly, 
similar to the sequence in some uncleic acid-binding protein, was conserved for 6 of 11 
flavivirus core proteins (C). 
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INTRODUCTION 


More than 60 members of animal viruses belong to the family Flaviviridae, which 
were classified into 8 serological complexes or subgroups by cross-neutralization using 
polyclonal antisera (Porterfield, 1980; de Madrid and Porterfield, 1974; Calisher et al., 
1989). Most of flaviviruses are transmitted among susceptible vertebrates by 
hematophagous arthropods in nature. Therefore, arthropod-borne vertebrate viruses, or ar- 
boviruses (Berge, 1975). Many flaviviruses have been recognized as human or veterinary 
pathogens of public health importance (Shope, 1980; Monath, 1986); for example, yellow 
fever (YF), Japanese encephalitis (JE), dengue fever (DEN), Murray Valley encephalitis 
(MVE), St. Louis encephalitis (SLE) and tick-borne encephalitis (TBE). 

Since entire uncleotide (nt) sequence and deduced amino acid (AA) sequence were 
analyzed for YF (Rice e£ al., 1985) and West Nile (WN) viruses (Castle et al., 1985, 1986; 
Wengler et al., 1985), genome of other flaviviruses were completely or partially sequenced. 
The rusults showed that flaviviruses share a common genomic organization: a single long 


Received for Publication, February 26, 1990. 
Contribution No. 2386 from the Institute of Tropical Medicine, Nagasaki University. 


18 


open reading frame (ORF) encoding a polyprotein of about 3,500 AA residues in a linear 
single-stranded and positive-sense RNA of about 11,000 nt residues (Westaway et al., 1986). 
The 5’ terminal one-fourth of the ORF encodes 3 structural proteins: core protein (C), 
precursor (PrM) of membrane protein (M), and envelope glycoprotein (E). While, remaining 
part of the ORF encodes 7 nonstructural proteins: NS1, ns2a, ns2b, NS3, ns4a, ns4b and 
NS5. Recently, different cleavage sites of Val-X-Ala tripeptide were postulated for ns2a 
and ns4b proteins by Speight e£ al. (1988), which could be modified to NS2A and NS4B by 
host signal peptidase (von Heijne, 1985). In this report, these cleavage sites were used to 
compare flavivirus sequences with YF sequence. Comparative nt and AA sequence analysis 
of flaviviruses was performed in order to obtain basic informations which are required to 
understand their biological characteristics and replication strategies at a molecular level, 
and to develop the second generation vaccine. 


MATERIALS AND METHODS 


Sources of the nt and deduced AA sequence informations: Flavivirus se- 
quence data were referred from followings: YF (Rice е? al, 1985); WN (Castle et al., 1985; 
Wengler et al, 1985); MVE (Dalgarno et al., 1986); SLE (Trent et al., 1987); JE (Sumiyoshi 
et al., 1987); Kunjin (KUN: Coia et al., 1988); type 1 DEN (DENI: Mason ef al, 1987); type 
2 DEN (DEN2: Hahn ef al, 1988; Deubel et al, 1988); type 3 DEN (DEN3: Osatomi, 1988); 
type 4 DEN (DEN4: Mackow et al, 1987: Zhao et al, 1986); TBE (Mandle et al, 1989). 


Simple comparison of.nt and AA sequences: Virus genome sequences were 
divided into segments according to Rice e£ al. (1985), excepting ns2a and ns2b genes which 
were divided as NS2A and NS2B according to Speight et al. (1988). Comparison of nt and 
AA sequences was first carried out for each viral genome segment without alignment. 


Comparison of nt and AA sequences after optimal alignment: Since dif- 
ferent flaviviruses possessed somewhat different length for each genome segment, it was 
necessary to compare the sequence data after alignment by filling up and skipping several 
residues to obtain maximal matching of the corresponding sequences. This procedure was 
carried out by the DNASIS Version 6.0 software system (Hitachi, 1989) for all 11 
flavivirus, and homology was analyzed by the method of Lipman and Pearson (1985). 
Homology was presented as the “score” instead of percentage in order to compare not on- 
ly matching of the same AA but also those AA in the same group and the frequency of 
AA usage. When the same nt was found between 2 compared sequences, the “score” got 4 points, 
while —2 points were allocated for different nt matching. In order to compare segment se- 
quence of different length, each “score” was divided by its length. For AA comparison, 
various “points” from —8 to +17 were provided as shown in Table 1. Kyte and Doolit- 
tle’s program (1982) of the DNASIS system was used to compare AA homology and 
hydrophobicity. Secondary structure of some AA sequences were predicted by the program 
of Chou and Fasman (1974). 
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Table 1. AA homology score 
Following one-letter or three-litter code of AA was used: A: Ala, R: Arg, N: 
Asn, D: Asp, C: Cys, Q: Gln, E: Glu, G: Gly, H: His, I: He, L: Leu, К: Lys, M: 
Met, F: Phe, P: Pro, S: Ser, T: Thr, W: Trp, Y: Tyr, V: Val 
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RESULTS AND DISCUSSION 


Numbers (No.) of nt and AA residues in flavivirus genome segments 
and simple comparison of their sequences: All flaviviruses possessed similar No. of 
nt and AA residues in each genome segment with some variations, and the No. was the 
same for M and NSI protein genes of all flaviviruses. In NS2B and NS4A protein genes, 
the difference of the No. was only within 2 AA residues (Table 2). Table 3 shows 
homology by simple comparison, which was graphically presented in Fig 1. 

Comparative nt and AA sequences of flaviviruses after optimal align- 
ment: Since simple comparison did not provide so valuable informations on flavivirus 
homology, each viral genome segment was compared after optimal alignment as summariz- 
ed in Tables 4a, 4b and Figs. 2a, 2b. The results showed that NS3 and NS5 proteins were 
most highly conserved, while C protein was least conserved among flaviviruses when their 
sequences were compared with YF data. Besides this high homology, low degree of 
homology deviation from YF virus was observed for these proteins. Since NS3 and NS5 
are nonstructural proteins, these results were reflected in higher homology of nonstructural 
proteins than structural proteins as a whole, as shown by the averaged homology score in 
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Table 2. Number of nt in virus genome segments Abbreviation of virus name is shown in text 
Genome segment 
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Table 3. Homology % of nt without alignment Abbreviation of virus name is shown in text 
| Genome segment 
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Fig. 1. Graphic presentation of nt homology % without alignment Abbreviation 
of virus name is shown in text 
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Table 5. Gradient of flavivirus relationships was deduced by comparing total score of struc- 
tural protein region in the order of YF > SLE > MVE >WN = JE > KUN > DENI, 2, 
3, > DEN4 > TBE. Since SLE, MVE, and DENI viruses were only partially sequenced, 
similar relationship was not obtained for nonstructural proteins including these viruses. 


Mydrophobicity profiles of flavivirus proteins: All 11 flaviviruses exhibited 
essentially similar hydrophobicity profiles. Although NS2A and NS2B proteins showed least 
AA sequence homology, their hydrophobicity profiles were highly conserved and 
hydrophobic, as reported by Rice ef al. (1986). 


Table 4a. Homology score of nt / length with optimal alignment Abbreviation of virus 
name is shown in text 
Genome segment 
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Table 4b. Homology score of AA / length with optimal alignment Abbreviation of virus 
name is shown in text 
Genome segment 


Table 5. Average of homology score / length in flavivirus structural and nonstructural 
proteus Abbreviation of virus name is shown in text 
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GENOME SEGMENT 


Fig. 2a. Graphic presentation of nt / length homology score with optimal alignment 
Abbreviatiof virus name is shown in text. 
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GENOME SEGMENT 


Fig. 2b. Graphic presentatiof AA / length homology score with optimal ahgnment 
Abbreviatiof virus name is shown in text. 
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Consensus sequence of NS3 and NS5 proteins: Since NS3 and NS5 proteins of 
all 11 flaviviruses were highly conserved, their common domains would be essential for 
their survival and replication. Rice ef al. (1985, 1986) and Strauss and Strauss (1986) 
postulated that NS5 protein could be a component of viral RNA polymerase because of a 
consensus triplet sequence Gly-Asp-Asp (G-D-D) of viral RNA polymerases (Kitamura et al., 
1981; Goelet et al, 1982). Present study showed that all 11 flaviviruses possessed this 
tripeptide in N$5 protein almost at the same position of AA No. 3149— 3195, confirming 
above postulation. While, an NTP-motif sequence Gly-X-Gly-Lys (G-X-G-K) of ATP-depen- 
dent DNA helicase (Gorbalenya ef al, 1988) was found in flavivirus NS3 protein 
(Takegami, personal communication). This protein was considered to function also as viral 
protease because of consensus sequence with serine protease (Bazan and Fietterick, 1989). 


Basic AA content of C protein: Basic amino acids, such as Arg and Lys, were 
rich in C protein (Arg: 8.04 to 14.41; Lys: 8.93 to 12.90 mol %), compared with other viral 
proteins. Content of Arg and Lys in whole viral proteins was obviously lower (Arg: 4.97 to 
7.11; Lys: 4.00 to 6.81 mol 9$) than in C protein. The result is reasonable to understand 
that C protein has to neutralize negative charge of viral RNA in order to complex with it 
and form virion core structure. 


Characteristic of M and NS4B proteins: Both M and NS4B proteins of 
flaviviruses showed high degree of deviation from those of YF virus. By comparing an at- 
tenuated JE virus ML-17 strain with its virulent parental JaOHO566 strain and wild 
JaOArS982 (Sumiyoshi e? al, 1987), classical Nakayama (McAda ef al, 1987) strains, 
several ML-17 strain-specific AA replacements were found in the M and NS4B proteins 
(Tanaka e£ al, 1989). The rusult suggests that these proteins were not essential for viral 
replication and survival, but may be related with virus virus virulence. Both M and NS4A 
proteins were highly hydrophobic, suggesting their localization in viral or host cell mem- 
branes. 


Glycosylation of flaviviral proteins: Possible N -glycosylation sites of Asp-X- 


Thr/Ser were found in several flaviviral proteins except NS2B and NS4A proteins (Table 6). 


Table 6. Numbers of possible N-glycosylation sites Abbreviation of virus name is shown in text 
Genome segment 
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However, E protein was not always glycosylated, suggesting that glycosylation could not 
be essential for its structure and function. For example, KUN and WN virus E proteins 
were not glycosylated. In contrast, all 11 flaviviruses possessed 1 to 3 N-glycosylation sites 
at AA No. 906—1,000 of NSI protein. NS1 protein was detected in the cytoplasmic foci of 
DEN2 and KUN virus-infected cells by the immunofluorescence (Westaway et al, 1987), 
and could also be transferred into infected cell culture fluid. Glycosylation was considered 
essential for the transportation and secretion of E and №51 proteins (Mason ef al., 1989). 


Disulfide bonds of flavivirus proteins: Nowak and Wengler (1987) reported 
that 12 Cys residues were conserved in E protein among flaviviruses, and the same obser- 
vation was also reported for NS1 protein of TBE (Mandl e£ al., 1989) and KUN virus (Coia 
et aL, 1988). These conserved 12 Cys residues could be important to form disulfide bonds 
and maintain integrity of E and NSI proteins. | 


Three dimensional configuration of flavivirus homology: Above homology 
data were transformed into the "distance" between each pair of flaviviruses using a newly 
devised computer program, and shown by 3-dimensional schemes so that the viruses with 
high homology were located in close position each other (Fig. 3). The schemes for several 
viral proteins appear to classify 11 flaviviruses into 3—4 groups. MVE-JE and WN-KUN 
viruses present very close relationships each other, and SLE virus was also close to them, 
while DENI, 2, 3, 4 viruses composed a single separate group, and YF and TBE viruses 
located at each separate position. These results generally agreed with classical serological 
subgroups of flaviviruses. | 


Refined homology analysis of some flavivirus proteins: Although above 
data could provide informations on the averaged homology for each viral genome segment, 
more refined homology search among each viral proteins was required to identify highly 
conserved domains. AA sequences of 6 flaviviruses (YF, WN, JE, KUN, TBE, РЕМА) 
were rearranged by optimal alignment and segmented into stretches, each consisting of 10 
AA residues. These 6 viruses were compared for each corresponding stretches, and No. of 
matched AA pairs were counted to obtain fine homology tendency for all 6 viruses. In this 
analysis, scores of only matched AA pairs were used, neglecting score of AA in the same 
group and frequency of AA usage. The result presented in Fig. 4 showed almost same 
tendency as previous analysis, but some peaks of highly conserved sequences, which were 
not identified before, were high-lighted in Fig. 5. In general, hydrophobic regions of E pro- 
tein were conserved, and C, PrM and M proteins also possessed highly conserved regions 
(Fig. 5) A hexapeptide sequence of Asn-Met-Leu-Lys-Arg-Gly was conserved in C protein 
of 6 flavivirus (WN, JE, KUN, MVE, SLE, DEN4) but not of YF and TBE viruses. This 
sequence was not peculiar to flavivirus C proteins but similar sequence was also reported 
for glycinamide ribonucleotide transformylase (GART) (Henikoff et al., 1983), DNA-directed 
RNA polymerase beta chain (Delcuve et al., 1980), and Escherichia coli 30 S ribosomal pro- 
tein 3S (Brauer and Roming, 1979). Although meaning of this conserved sequence was not 
clarified yet, it is not surprising that flavivirus C protein could have a function of RNA 
binding protein. 
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Fig. 3. Three dimensional configuration of flavivirus homology The schemes of C pro- 
tein (A), PreM protein (B), M protein (C) E protrin (D), NS1 protein (E), NS5 protein 
(F) were shown. Abbreviatiof virus name is shown in text. 
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Secondary structure and antigenic domain of flavivirus E protein: 
Classical serological relationships among flaviviruses were mostly based on the antigen-an- 
tibody reactions of E protein (Porterfield, 1980). E protein sequences of other flaviviruses 
did not show so high homology to YF as the NS5, but the degree of deviation was low 
(Figs. 2a, 2b). Based on the 6 conserved disulfide bridges of flavivirus E protein, Nowak 
and Wengler (1987) postulated a 2-dimensional model of E protein, which is composed of 5 
regions; RI, L1, R2, L2 and R3. When E protein gene of above 6 flaviviruses were com- 
pared, 5 highly conserved regions were identified, 2 of them on R1, 3 on R3 region, 
respectively (Fig. 5). Srivastava ef al. (1990) suggested denaturation-resistant neutralizing 
epitope (s) on a 8Kd peptide cleaved from JE virus E protein by cyanogen bromide, which 
corresponds to AA No. 375—456 of E protein. Secondary structure predicted for this 
region (Fig. 6) was almost similar for JE, MVE, KUN and WN viruses, while TBE, YF 


Scores of homology (AA) 
© 
С) 


Genome segment (structural protein) 


Scores of homology (AA) 


Genome segment (non-structural protein) 


Fig. 4. Graphic presentation of homology calculated for each 10 AA residues of 
flavivirus proteins. The ordinate is the numbers of matched AA pairs among 
6 flaviviruses. (YF, WN, JE, KUN, TBE, DEN4) 
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and DEN4 viruses showed each characteristic structure. The result agreed well with 
classical serological subgroups of flaviviruses (Porterfield, 1980). Wengler and Wengler 
(1989) argued that sequential neutralizing epitopes were not identified on WN virus E pro- 
tein, although hydrophilic beta turn structure, which could form strong antigenic domain 
(Cohen e? al., 1984; Eisenberg ef aL, 1985; Pellett et al., 1985) was found (Nowak and 
Wengler, 1987). Mutations were not considered to occur at random in E protein gene, 


because of some highly conserved domains. These domains could probably be required for 
the structure and function of E protein, such as anchoring in the envelope lipid bilayer or 
expressing essential epitopes as suggested by Castle et al. (1985). 
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Fig. 5. Conserved regions in flavivirus structural proteins abbreviatiof virus 
name is shown in text. 
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Fig. 6. Predicted secondary structure of conserved region in flavivirus E proteins. 
Simbols: 260900209, a—helix; MYYYYYYY, В —ѕһееб; | |, B—turn. Ab- 
breviatiof virus name is shown in text. 
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